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Abstract — Granular association rule is a new approach to 
reveal patterns hide in many-to-many relationships of relational 
databases. Different types of data such as nominal, numeric and 
multi-valued ones should be dealt with in the process of rule 
mining. In this paper, we study multi-valued data and develop 
techniques to filter out strong however uninteresting rules. An 
example of such rule might be "male students rate movies 
released in 1990s that are not thriller." This kind of rules, called 
negative granular association rules, often overwhelms positive 
ones which are more useful. To address this issue, we filter out 
negative granules such as "not thriller" in the process of granule 
generation. In this way, only positive granular association rules 
are generated and strong ones are mined. Experimental results 
on the movielens data set indicate that most rules are negative, 
and our technique is effective to filter them out. 

Index Terms — Association rule, recommender system, multi- 
value, positive granule, negative granule. 

I. Introduction 

Granular association rule UJ, |f2) is a new approach to 
build recommender systems 0, J4), 0. The data model is 
a many-to-many entity-relationship system (MMER) which is 
composed of two information systems and a relation between 
them [2]. For example, the movielens data set |6j is composed 
of users, movies, and the rating of movies by users. Suppose 
we are interested in what kind of users rate what kind of 
movies. "Women rate horror movies" and "male students rate 
thriller movies released in 1995" might be two interesting 
granular association rules. Here we observe that both sides 
of a granular association rule can take different number of 
attributes, therefore they have different granules Q, 0, (9), 
[ 10 1, 111]. This is the major difference of this types of rules 
from other relational association rules (e.g., lfT2l . ifTJl . OH). 

The original definition of granular association rule 
considers only nominal data. In applications, there are other 
types such as numeric, multi-valued, and interval valued data. 
Numeric data might be the most important type in applications. 
For example, in the movielens data set, each movie has a 
release date. It is hard to construct strong rules using this 
information directly since few movies are released in the same 
day. We would like to use release year, or even the release 
decade instead to obtain coarser granules and stronger rules. 

In this paper, we consider multi-valued data which are also 
common in applications. In the movielens data set, each movie 
may belong 1 to 18 genres, including action, adventure, 
children, and so on. However, multi-valued data cannot be 
stored directly into relational databases. We need to preprocess 
them before constructing information systems and MMERs. 
There are at least three approaches to this issue. 



1) Combine existing movie genres to form new ones. For 
example, action + children is a new genre. In 
this way, if a movie is action + children, it is 
neither action nor children. Hence this approach 
is unreasonable from the semantic viewpoint. 

2) Assign a priority to each genre and keep only the 
most important one for a movie lfl5l . For example, if 
a movie is action + children, we will view it 
only as action. The drawbacks are also obvious: many 
interesting rules cannot be found. 

3) Scale the movie genre attribute into 18 boolean at- 
tributes. With this approach, we obtain "male students 
rate movies released in 1990s that are not thriller," 
which is stronger than "male students rate thriller movies 
released in 1990s." This is because that each user rate 
only a small fraction of all movies. Therefore we need 
to filter out this kind of uninteresting rules. 

We adopt the third approach and amend the drawback 
directly. For this purpose we define positive granules, positive 
granular association rules and negative ones. A granule is 
positive if and only if all attribute values of the scaled data 
are true. For example, "thriller movies released in 1990s" is 
a positive granule, while "movies released in 1990s that are 
not thriller" is a negative granule. A granular association rule 
is positive if and only if both sides of the rule are positive 
granules. For brevity, a positive (negative) granular association 
rule will be called a positive (negative) rule. 

We propose an algorithm with three main steps to mine 
all strong positive rules satisfying thresholds of four measures 
[2 |. Step 1, generate positive granules with length one. Step 2, 
produce longer positive granules following the structure of the 
Apriori algorithm ifTBI . Naturally, only positive granules satis- 
fying coverage measures are kept. Step 3, generate candidate 
rules through connecting positive granules on two universes, 
and check wether or not these rules satisfy the confidence 
thresholds. A technique developed in ifTTl is employed to 
speed up the third step. 

Experiments are undertaken on the movielens data set (6). 
We have a number of observations. First, many interesting 
rules are lost if we adopt the priority assigning approach. 
Therefore the priority-based approach is unacceptable. Second, 
if we do not filter out negative granular association rules, they 
will overwhelm positive ones. In fact, with thresholds settings 
that generates thousands of rules, not even one positive rule 
is generated. In summary, our algorithm keeps all interesting 
rules, and at the same time filters out a large number of 
uninteresting rules. 



II. Positive rules 

In this section, we define positive granules and positive 
rules. Since granules and granular association rules have been 
well defined in [2], we will focus on new ones. 

A. Information systems and granules 

The data model is based on information systems and binary 
relations. 

Definition 1: S = (U,A) is an information system, where 
U = {x\,X2, ■ ■ ■ ,x n } is the set of all objects, A = 
{at, a,2, ■ ■ ■ , a m } is the set of all attributes, and aj(xi) is the 
value of Xi on attribute a,j for i G [l..n] and j E [l..m]. 

User information of the movielens data set are stored in 
an information system given by Table [if a), where |E7| = 943 
and A = {User-id, Age, Gender, Occupation}. This table is 
different from its original version in two aspects. First, the 
Zip-code attribute is removed since they are not useful in 
the mining process. Second, the age of the user is discretized 
according to given intervals [0,17], [18,24], [56, oo). In 
this way, all attributes in Table [If a) are nominal. 

In an information system, any A 1 C A induces an equivalent 
relation US), OH 

E A > = {(x,y) EUx U\Ma G A',a(x) = a(y)}, (1) 

and partitions U into a number of disjoint subsets called 
blocks. The block containing x G U is 

E A , (x) = {ye U\Ma G A', a(y) = a(x)}. (2) 

Definition 2: 11201 A granule is a triple 

G=(g,i(g),e(g)), (3) 

where g is the name assigned to the granule, i(g) is a 
representation of the granule, and e(g) is a set of objects that 
are instances of the granule. 

g = g(A',x) is a natural name to the granule. 

i(g(A',x))= f\ (a:a(x)). (4) 

aeA' 

e{g{A',x)) = E A ,{x). (5) 
The support of g(A' , x) is 

supp(g(A', x)) = supp( f\ (a : a{x))) = ^p^- (6) 

aeA' ' ' 

B. Scaled attributes and positive granules 

A multi-valued attribute has a domain of a power set. 
In the movielens data set, there are 18 genres includ- 
ing action, children, adventure, etc. Since movies 
can be in several genres at once, the domain of genre 
is 2 18 instead of 18. Attribute values include {action}, 
{children}, {adventure}, {action, children}, 
{action, adventure}, etc. unknown correspond to 0. 
This attribute can be replaced by 18 boolean attributes, with 
each indicating whether or not the movie is in the respective 
genre. This technique is called scaling [21 1 and serves as the 



foundation of formal concept analysis [22 1. In fact, the original 
data set contain the scaled information instead of the multi- 
valued one. It is given by Table [if a). Here we use release 
decade instead of release date to obtain a finer granule. 

In order to describe this kind of data, we propose the 
following definition. 

Definition 3: Let S = (U, A) be an information system. 
Any a C A is a scaled attribute if a(x) G {0, 1}, a(x) = 1 
indicate that x has the attribute specified by a, and a(x) = 
for otherwise. 

With scaled attributes identified by the expert, we can focus 
on granules that are interesting to us. 

Definition 4: Let S = (U, A) be an information system and 
A be the set of all scaled attributes. C — (A',x) where x E 
U, A' C A is called a positive granule iff Va e A' n A D , 
a(x) = 1. 

In other words, a positive granule requires that all scaled 
attributes take true values. With positive granules identified, 
we can filter out unimportant granule from the very beginning. 

C. Many-to-many entity-relationship systems 

Definition 5: Let U = {xi, X2, ■ ■ ■ , x n } and V = 
{yii 2/2, • • • , Vk} be two sets of objects. Any R C U x V 
is a binary relation from U to V. The neighborhood of x € U 
is 

R(x) = {y€ V\(x,y) e R}. (7) 

When U — V and R is an equivalence relation, R(x) is the 
equivalence class containing x. From this definition we know 
immediately that for y G V, 

R- 1 (y) = {xeU\(x,y)€R}. (8) 

An example of binary relation is given by Table [If c), where 
U is the set of users as indicated by Table [If a), and V is the 
set of movies as indicated by Table [Ifb). 

Definition 6: J2) A many-to-many entity-relationship sys- 
tem (MMER) is a 5-tuple ES = (U, A, V, B, R), where (U, A) 
and (V, B) are two information systems, and R C U x V is 
a binary relation from U to V. 

An example of MMER is given by Table [I] 

D. Positive rules 

Granular association rules reveal patterns in the MMERs. 
They connect granules of two universes. 

Definition 7: J2J A granular association rule is an impli- 
cation of the form 

(GR): f\(aia(x))=> /\{b:b(y)), (9) 

aeA' b£B' 

where A' C A and B' C B. 

Definition 8: A granular association rule indicated by 
Equation (|9ji is positive if both (A' : x) and (£>', y) are positive 
granules. 

For brevity, in the following context granular association 
rules will be simply called rules, and positive (negative) gran- 
ular association rules will be simply called positive (negative) 



TABLE I 

A MANY-TO-MANY ENTITY-RELATIONSHIP SYSTEM 
(a) User 



User-id 




Age 


Gender 




Occupation 


1 




18,24] 


M 




technician 


2 




50, 55] 


F 




other 


3 




18, 24] 


M 




writer 


943 




18, 24] 


M 




student 


(b) Movie 


Movie-id Release-decade Action 


Adventure 


Animation . . . Western 


1 


1990s 














2 


1990s 





1 


1 





3 


1990s 














1,682 


1990s 














(c) Rates 


User-id\ Movie-id 


1 2 


3 4 


5 


1,682 


1 




1 


1 








2 




1 


1 





1 


3 










1 


1 


943 







1 1 





1 



rules. According to Equation Q, the set of objects meeting 
the left-hand side of the rule is 

LH(GR) = E A ,{x); (10) 

while the set of objects meeting the right-hand side is 

RH{GR) = E B ,{y). (11) 

III. Positive rule mining 

In this section, we first revisit four measures that evaluate 
the strength of a granular association rule [2 1. Then we define a 
positive rule mining problem. The problem is slightly different 
from the one define in in that it only requires positive rules. 
Finally we develop a rule mining algorithm which is similar 
to the one proposed in |[T5l for the new problem. 

A. Measures of granular association rule 

From the movielens data set, we may obtain a rule "35.5% 
male students rate 26.7% thriller movies released in 1990s, 
14.4% users are male students and 12.5% movies are thriller 
released in 1990s." Here 35.5%, 26.7%, 14.4%, and 12.5% 
are the source coverage, the target coverage, the source confi- 
dence, and the target confidence, respectively. These measures 
are defined as follows. The source coverage of a rule is 



scov(GR) = \LH{GR)\/\U\. 
The target coverage of GR is 

tcov{GR) = \RH(GR)\/\V\. 



(12) 



(13) 



There is a tradeoff between the source confidence and the 
target confidence of a rule. Consequently, neither value can be 
obtained directly from the rule. To compute any one of them, 
we should specify the threshold of the other. Let tc be the 



Algorithm 1 A backward algorithm 
Input: ES = (U, A, V, B, R), ms, mt, sc, tc. 
Output: All positive rules satisfying given thresholds. 
Method: backward 



l: SG{ms) = {(A',x) e V 
> ms}; 

2: TG(mt) = {(B',y) E 2 l 
i^Mi > mt} . 



x U\(A',x) is positive, 
x V\(B',y) is positive, 



|V| 

for each g' e TG(mt) do 
Y = e{g>); 



X = R- 



for each g 6 SG(ms) do 

if(\Xne(g)\/\e(g)\>sc) then 

output rule i(g) =>■ «(<?')> 
end if 
end for 
end for 



target confidence threshold. The source confidence of the rule 
is 



sconf(GR, tc) 



\{ X eLH{GR)\ \«W™™ >tc}\ 



\LH{GR)\ 



(14) 



B. The positive rule mining problem 

Now we propose a rule mining problem as follow. 

Problem 1: The positive rule mining problem. 

Input: An ES — (U, A, V, B, R), a minimal source cover- 
age threshold ms, a minimal target coverage threshold mt, a 
minimal source confidence threshold sc, and a minimal target 
confidence threshold tc. 

Output: All positive rules satisfying scov(GR) > ms, 
tcov(GR) > mt, sconf(GR) > sc, and tconf(GR) > tc. 

This problem is quite similar to the one discussed in |[T), 
@, US], ifTTl . The only difference is that positive rules instead 
of all rules are output. 

C. A backward algorithm 

We propose an algorithm to deal with Problem [T] The 
algorithm is listed in Algorithm [T] It is very similar to the 
algorithm proposed in IfTTl . The difference lies in the first 
two lines. To implement these lines, we first produce positive 
granules with length one. For example, "gender is male," 
"genre is thriller" and "release decade is 1990s". Then we 
follow the structure of the Apriori algorithm to produce longer 
positive granules. For example, "genre is thriller and release 
decade is 1990s", or equivalently, "thriller movies released in 
1990s". In this way, only positive granules are generated. The 



conditions ^jjjp^ > ms and ^-j 3 ^^ > mt ensure that only 
positive granules satisfying coverage measures are kept. 

Lines 3 through 10 mine rules satisfying confidence mea- 
sures. To explain these codes we should revisit the definition 
of lower approximation on two universes. 




0.07 0.08 0.09 

ms(mt) 

Fig. 1. Rules mined through priority-based and scaling-based approaches 



Definition 9: ifTTl Let U and V be two universes, R C 
U x V be a binary relation, and < j3 < 1 be a user-specified 
threshold. The lower approximation of X C [/ with respect 
to i? for threshold j3 is 



\R~Hy)nx\ 
\x\ 



>(3}- 



(15) 



From this definition we know immediately that the lower 
approximation of Y C V with respect to R is 



BTl B (V) 



(16) 



Here /3 corresponds with the target confidence instead. The 
lower approximation can help speeding up the mining process. 
This issue has been discussed in IfTTl . and similar phenomenon 
holds for our problem. 

IV. Experiments on the movielens data set 

The Internet Movie Database (6} is widely used in recom- 
mender systems (see, e.g., [23 1). It contains 100,000 ratings 
(1-5) from 943 users on 1,682 movies, with each user rating 
at least 20 movies. The main purpose of our experiments is 
to answer the following questions. 

1) Does the priority -based approach lose important infor- 
mation? 

2) Is it necessary to remove negative rules? 

A. Priority-based approach vs. scaling-based approach 

With the priority-based approach, we assign a priority to 
each genre and keep the most important one for a movie lfT31 . 
In this way, no negative rule exists. However, some infor- 
mation is lost. In contrast, with the scaling-based approach, 
no information is lost, and negative rules are filtered out by 
Algorithm [JJ Now we compare the number of positive rules 
that are generated through these two approaches. We use the 
following setting: 

(Setting 1) sc — tc — 0.1, ms = mt, and mt € [0.05, 0.12]. 

Results are depicted in Figure [JJ Here we observe that the 
number of rules mined through the scaling-based approach is 
more than twice of the priori-based approach. Therefore the 
information lost by the priority-based approach is unacceptable 
in applications. 




(a) 




(b) 



Fig. 2. sc = 0.12, tc = 0.15, ms ■■ 
Number of negative rules. 



0.1 (a) Number of positive rules; (b) 



B. Influence of negative rules 

In applications, we want the recommender system to gen- 
erate a number of rules. This number should not be too big; 
otherwise it will be impossible for users to pick up interesting 
and useful ones. Therefore we need to specify thresholds on 
four measures carefully such that a few to a few hundred rules 
are generated. 

First, we generate rules using the algorithm presented in 
IfTTl . We use the following setting: 
(Setting 2) ms = 0.1, mt = 0.85, sc = 0.12, tc = 0.15. 
With this setting we obtain 3,300 rules. Some of them are 
given below: 

(Rule 1) (Gender, Male) A (Occupation, Student) (136) 

=S> (Adventure, 0) A (Mystery, 0)(1488) 
[scov = 0.144, tcov = 0.884, sconf = 0.132, tconf 

(Rule 2) (Gender, Male) A (Age, (0, 18]) (136) 

=*> (Animation, 0) A (War, 0)(1569) 
[scov = 0.144, tcov = 0.933, sconf = 0.132, tconf 

The target coverage threshold is mt = 0.85, 
these rules are very strong from this viewpoint. Unfortunately, 
they are all negative rules, and they are not quite interesting. 
Rule 1 is read as "Male students rate movies that are neither 
Adventure nor Mystery, 136 users are male students and 1,488 
movies are neither Adventure nor Mystery." It is straight 
forward to compute the source/target coverage. The source 



0.150] 



0.150] 
therefore 



coverage of the rule is 136/943 f» 0.1442 > 0.1, and the 
target coverage is 1488/1682 ss 0.8847 > 0.85. As discussed 
earlier, we cannot obtain the source/target confidence directly. 
We only know that they exceed the given thresholds. 

Second, we generate positive rules using Algorithm [T] We 
use the following setting: 

(Setting 3) ms = 0.1, mt = 0.1, sc = 0.12, tc = 0.15. 
This setting is different from Setting 2 only on rat = 0.1. 
With this setting we obtain 72 positive rules. Some of them 
are given below: 

(Rule 3) (Gender, Male) A (Age, (0, 18])(136) 

=>■ (Year, 1990s) A (Action, 1)(206) 
[scov = 0.144, tcov = 0.122, sconf = 0.250, tcoaf = 0.150] 

(Rule 4) (Gender, Male) A (Occupation, Student) (136) 

=> (Year, 1990s) A (Thriller, 1)(211) 
[scov = 0.144, tcov = 0.125, sconf = 0.220, tconf = 0.150] 

Rule 3 is read as "Young men no more than 18 years old 
rate action movies released in 1990s." This is quite interesting 
to us even though there are only 206 action movies released 
in 1990s. 

Third, we observe the change of the number of rules with 
different settings of rat. Figure |2| a) shows the number of rules 
for mt € [0.75,1]. Unfortunately, all rules are negative ones. 
To produce positive rules, we should use lower rat. Figure |2jb) 
shows the number of positive rules for rat £ [0.05, 0.35]. We 
observe that there would be no positive at all for rat > 0.35. 
However, according to Figure |2ja), there are about 9,000 
negative rules for mt — 0.75. Hence we cannot generate 
negative rules for mt > 0.35 since there are too many of 
them. In other words, if we do not filter out negative granular 
association rules, they will overwhelm positive ones. 

C. Discussions 

Now we can answer the questions proposed at the beginning 
of this section. 

1) The priority -based approach loses important information 
and rules. The scaling -based approach, on the other 
hand, keeps all information and helps mining all positive 
rules. 

2) It is very important to remove negative rules because 
they overwhelm positive ones. 

V. Conclusions and further works 

In this paper, we deal with multi-value data with two 
objectives. The first is to keep all useful information such that 
all interesting granular association rules can be mined. This 
is achieved through attribute scaling. The second is to remove 
strong however uninteresting rules. This is achieved through 
filtering out negative granules and negative rules. In the future, 
we will address other types of data such as interval valued for 
more applications. 
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